Day 18: Fine-Tuning Concepts and Strategies
Fine-tuning is the process of adapting a pre-trained LLM to your data and use case. However, fine-tuning is not always the answer. Today we learn the criteria for deciding when and how to fine-tune.
Prompting vs Fine-Tuning Decision
Before fine-tuning, first check whether prompt engineering can solve the problem. Use the following criteria to decide.
# Decision tree expressed as code
decision_tree = {
"Does the current model + prompt produce the desired quality?": {
"Yes": "Fine-tuning not needed. Focus on prompt optimization",
"No": {
"Does Few-shot + RAG improve results?": {
"Yes": "Build a RAG pipeline recommended",
"No": {
"Is domain-specific knowledge needed?": {
"Yes": "Proceed with fine-tuning (LoRA/QLoRA recommended)",
"No": {
"Do you need to change output format/style?": {
"Yes": "Proceed with SFT (Supervised Fine-Tuning)",
"No": "Consider switching to a larger model",
}
}
}
}
}
}
}
}
# Decision criteria summary
criteria = [
("Prompting is sufficient", "General questions, simple format conversion, translation"),
("RAG is appropriate", "Up-to-date information needed, internal document-based Q&A"),
("Fine-tuning needed", "Domain-specific terminology, specific output style, consistent persona"),
]
for method, usecase in criteria:
print(f"[{method}] {usecase}")
Full Fine-Tuning vs Parameter-Efficient Fine-Tuning
Full Fine-Tuning updates all parameters. It is effective but requires enormous GPU memory. PEFT (Parameter-Efficient Fine-Tuning) trains only a small number of parameters to achieve similar results.
# Comparison table by fine-tuning method
comparison = {
"Method": ["Full FT", "LoRA", "QLoRA", "Prompt Tuning"],
"Trained params": ["100%", "0.1~1%", "0.1~1%", "<0.01%"],
"GPU memory": ["7B=28GB+", "7B=16GB", "7B=6GB", "7B=16GB"],
"Training speed": ["Slow", "Medium", "Medium", "Fast"],
"Performance": ["Best", "Very good","Good", "Limited"],
"Recommended": ["Large budget","General", "GPU limited","Quick experiments"],
}
# Print table
for key, values in comparison.items():
print(f"{key:16} | {' | '.join(values)}")
Data Requirements Guide
# Recommended data volume by fine-tuning task
data_requirements = {
"Text classification": {"min": "500", "recommended": "2,000~5,000", "note": "100+ per class"},
"Sentiment analysis": {"min": "1,000", "recommended": "5,000~10,000", "note": "Label balance important"},
"Summarization": {"min": "1,000 pairs", "recommended": "5,000~20,000 pairs", "note": "Source-summary pairs"},
"Dialogue (chatbot)": {"min": "500 convos", "recommended": "3,000~10,000 convos", "note": "Include diverse scenarios"},
"Domain adaptation": {"min": "5,000", "recommended": "10,000~50,000", "note": "Domain text"},
"Code generation": {"min": "2,000 pairs", "recommended": "10,000~50,000 pairs", "note": "Description-code pairs"},
}
for task, info in data_requirements.items():
print(f"\n[{task}]")
print(f" Minimum: {info['min']}, Recommended: {info['recommended']}")
print(f" Note: {info['note']}")
Key principle: Quality matters more than quantity. Well-curated 1,000 samples are better than noisy 10,000 samples. Always manually review your data before fine-tuning.
Fine-Tuning Cost Estimation
For a 7B model, LoRA fine-tuning takes 13 hours on a single A100. Cloud GPU costs (Lambda, RunPod, etc.) are around $13 per hour, so expect approximately $5~10 per training run.
Today’s Exercises
- Choose one of your real-world tasks, follow the decision tree above, determine whether prompting/RAG/fine-tuning is appropriate, and document the reasoning.
- Research the parameter counts and model sizes (GB) of 7B, 13B, and 70B models on Hugging Face Hub, then estimate the GPU memory required for Full FT/LoRA/QLoRA for each.
- Investigate how fine-tuning data can be collected in your domain (medical, legal, finance, etc.) and document at least 3 data sources.